In this work, we propose a semi-supervised method for short text clustering,where we represent texts as distributed vectors with neural networks, and use asmall amount of labeled data to specify our intention for clustering. We designa novel objective to combine the representation learning process and thek-means clustering process together, and optimize the objective with bothlabeled data and unlabeled data iteratively until convergence through threesteps: (1) assign each short text to its nearest centroid based on itsrepresentation from the current neural networks; (2) re-estimate the clustercentroids based on cluster assignments from step (1); (3) update neuralnetworks according to the objective by keeping centroids and clusterassignments fixed. Experimental results on four datasets show that our methodworks significantly better than several other text clustering methods.
展开▼